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We introduce a convergent iterative algorithm for finding the optimal coding and decoding oper- 
ations for an arbitrary noisy quantum channel. This algorithm does not require any error syndrome 
to be corrected completely, and hence also finds codes outside the usual Knill-Laflamme definition 
of error correcting codes. The iteration is shown to improve the figure of merit "channel fidelity" in 
every step. 
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INTRODUCTION 

From the beginning of the development of quantum 
information theory, it was recognized that without suit- 
able error correcting procedures the fantastic promises of 
this new discipline, such as the exponential speedup in 
Shor's algorithm, or the possibility of long range quan- 
tum communication and secure key exchange would never 
be realizable. Therefore, the development of the first er- 
ror correcting codes 0, and the subsequent more sys- 
tematic theory by Knill and Laflamme |3 were crucial 
achievements. It became clear that although naive clas- 
sical ideas, like redundant transmission and majority rule 
decisions on the outputs, are ruled out by the no-cloning 
theorem, techniques from classical coding theory (e.g., 
additive codes) could be used to construct good quan- 
tum codes as well. The quantum codes constructed in 
this way share with their classical counterparts the com- 
binatorial/algebraic flavor. They are designed to correct 
a certain finite dimensional subspace of errors, such as 
errors occurring on only a small number of the parallel 
channels employed. If the space of corrected errors is 
suitably chosen, such codes can also be used to correct 
generic small errors, i.e., one can show that any channel 
close to the ideal channel can be corrected with small 
overhead 0. 

However, for errors of fixed finite size it is not at all 
clear that the special form of Knill-Laflamme codes al- 
lows the most efficient error correction. Alternative codes 
might not correct any error completely, but in exchange 
might improve correction of the errors ignored by the 
Knill-Laflamme codes, resulting in an improved overall 
performance. Consider, for example, the famous five bit 
code , applied to the five- fold tensor product of a 
depolarizing qubit channel with a certain depolarization 
probability p. Figure^shows the fidelity achieved by this 
code as a function of p, together with the same parame- 
ter without any correction. For p> 1 — y^2/3 ~ 18% the 
performance of the five bit code is actually worse than 
doing no correction operation at all. It seems implausi- 
ble that the best code should jump from using all qubits 
to using only one at the crossover point, which suggests 
looking for better codes in that area. 

In this Letter we develop a method which allows us to 
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FIG. 1: Fidelity of the five bit code applied to 5-fold tensor 
product of the depolarizing qubit channel (solid line) com- 
pared to the fidelity of the depolarizing channel (dashed line) . 



search numerically for an optimal code adapted to arbi- 
trary noise. Thus, for any given noisy channel T (not 
necessarily a product T = S*®" of channels operating in- 
dependently on n smaller systems) , we look for an encod- 
ing channel E and a decoding channel D, with suitable 
domain and range, such that DTE comes as close as pos- 
sible to the ideal channel on a fixed d-level system. In 
contrast to Knill-Laflamme theory we make no assump- 
tions on the coding and decoding channels E and D. The 
basis of the method is an iteration by which either E or 
D is changed, such that fldclity is improved in each step. 
The results are related to Knill-Laflamme theory as fol- 
lows: 

1. Surprisingly, the codes in Fig^turn out to be opti- 
mal already: up to the critical value of the depolar- 
ization probability the five-bit code is optimal, and 
beyond that the best way of using up to five bit 
encodings is to do nothing. However, this has little 
bearing on general channels, since the depolarizing 
channels are highly symmetric. 

2. The encoding operation comes out to be an isome- 
try even on random channels. This is a basic fea- 
ture of Knill-Laflamme theory. 

3. Sometimes the Knill-Laflamme theory applies, but 
not, as is usually done, to the correction of local- 
ized errors. Instead, certain non-localized errors 
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are corrected. Such instances are also known in the 
decoherence-free subspace approach Q , and are re- 
hably found by our method, since we do not use 
a tensor product decomposition of T in the first 
place. 

4. Sometimes Knill-Laflamme theory fails entirely in 
the sense that no error at all is corrected com- 
pletely: although i?, D optimally correct a channel 
T, there might be no channel T' such that DT'E 
is a multiple of the identity. 



OVERVIEW OF THE METHOD 

Quantum information theory describes computation 
in terms of preparation, processing and measurements. 
We consider only finite dimensional quantum systems, 
i.e., systems whose observable algebra is of the form 
B{Ti), the linear operators on a finite dimensional Hilbcrt 
space Ti.. The quantum states, which physically de- 
scribe the preparation process are given by density op- 
erators p in B(Ti.). Measurements are given by self- 
adjoint operators on 7i, or, more generally by positive 
operator-valued measures. Processing operations, e.g. 
the free time evolution, a computation or a noisy trans- 
mission, are described by channels. These can either 
be considered as a modification of all subsequent mea- 
surements (Heisenberg picture), or as a modification of 
the preparation (Schrodinger picture). In this article we 
choose the latter option, i.e., channels are mathemati- 
cally given by completely positive trace preserving op- 
erators S : B{TLi) B{H2), where TLi is the Hilbert 
space of the input systems, and TL2 describes the output 
systems, and S(p) is the state obtained by sending the 
input state p through the channel. The encoding and de- 
coding operations of an error correction scheme are also 
channels in this sense, with appropriate choices of input 
and output Hilbert spaces. Every channel S has a Kraus 
representation S{p) — Sips*, with SiiTii 7^2, and 
J2^s*s^ = 1. When Hi = 7^2, the 'noisiness' of S 
is, loosely speaking, its distance from the ideal channel. 
There are many different ways of expressing this quan- 
titatively. In this Letter we use a special case of Schu- 
macher's Entanglement Fidelity the channel fidelity. 
It is defined as 
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Fc{s) = (^n {id(g,s){\n)(n\) 

= (dimHi)-2^|tr(s,) 



(1) 



where \Q) = (dimHi)^"^/^) J2k 1^^) is the standard max- 
imally entangled unit vector in Tii (SD Ti-i and id is the 
identity channel on B{Ti.i). This quantity is 1 if and only 
if the channel is ideal, and is directly related to the mean 
fidelity for pure input states j9|. 



The problem of finding an optimal code for a given 
channel T:B{7ii) — > B{Ti,2) is now the construction of 
an encoding channel E:B{TLo) ^ B{TLi) and a decod- 
ing channel D: ^(^2) ^ B(7^o) such that Fc{DTE) be- 
comes maximal. This is always a fairly high dimensional 
search problem. For example, if Tig is a single qubit, 
and T is the five-fold tensor power of a given noisy chan- 
nel (dim Til — 32), i.e., the case considered in figure ^ 
the description of D and E together requires some 7000 
parameters. General purpose optimization routines will 
usually choke on this, and there only is a chance if special 
properties of this variational problem can be brought to 
bear. 

What we use in the present Letter is that the func- 
tional E ^ Fc{DTE) and D ^ Fc{DTE) are both 
linear, and take positive values on completely positive 
operators. The iteration procedure described in the next 
section finds a maximum of any functional with these 
properties. The overall maximization then proceeds see- 
saw fashion, by applying the iteration first with a fixed 
random _E, optimizing the fidelity over D, then fixing D 
and optimizing E, and so on. Since every step of the iter- 
ation is proved to increase fidelity and each stable fixed 
point of the single iteration is a global maximum, this 
procedure is guaranteed to find at least a locally optimal 
pair of encoder and decoder. 



THE BASIC ITERATION 

The iteration we consider is a close relative of the 
power method for finding the eigenvector for the largest 
eigenvalue of a positive semi-definite matrix A. This 
method starts with a random unit vector 0o, and each 
step consists of applying A, and normalizing, i.e., (j)n+i = 
A(/)„ / 1 1 yl(/)„ 1 1 . It is easy to see that the convergence of 
this algorithm is exponential with a rate determined by 
the gap to the next largest eigenvalue. Moreover, the 
inequality 



{4>\A^4>) > {M^) 



(2) 



which is valid for any positive semi-definite Hilbert space 
operator A, shows that convergence is monotone, in the 
sense that is a non-decreasing sequence. 

Suppose now that we want to find a channel 5* max- 
imizing a linear objective functional /, which is defined 
on arbitrary operators S : B{'Hi) — >• B{H2), and positive 
on all completely positive maps. Note that f{S) is bi- 
linear in the Kraus operators Si of S. So in a sense we 
will presently make precise, the objective functional / is 
analogous to the matrix element {4>s\F4's) of a positive 
operator associated with /, where the vector (j)s corre- 
sponds to the set of Kraus operators Si . In our iteration 
we apply the operator F to each Kraus operator and 
get a modified completely positive map. This will not 
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be a channel, because it is not trace preserving. Hence 
we have to include a normalization step. Since the nor- 
malization of a completely positive map is given by an 
operator (not a scalar) one cannot simply "divide by the 
normalization" . We show, however, how to do the nor- 
malization in such a way that the desirable features of 
the power method do carry over. 

Let us now make these ideas precise. By £^(7^1,^2) 
we denote the space of Hilbert Schmidt operators from 
Til to 7^2 with scalar product = tr(x*?/). Then 

if l/x), n = 1, . . . , dim Til denotes the vectors of a basis 
of Hi, we associate with any map S: B{Hi) B{H2) an 
operator S G B{C'^{Hi,H2)) by 

^(a:)=^5(|/i)H)x|:.}(A*|, (3) 

In fact, this is just a reshuffling of matrix elements since 
{a\S{\ii){u\)\b) = {a\S{\h){v\)\n). The key feature of the 
correspondence S S, also known as the Jamiolkowski 
duality, is that S is completely positive if and only if S 
is a positive semi-definite operator on the Hilbert space 
£^(7ii, 7^2)- Indeed, the Kraus decomposition of S trans- 
lates directly into 

^ = El^O)^^h (4) 

i 

and the operators with such a representation are precisely 
the positive semi-definite operators on £^(7^1,7^2)- 

The objective functional f{S) can now be written in 
terms of S, and thus becomes a positive linear functional 
on the positive operators on £^(7^1,7^2)- But such func- 
tionals are themselves given by positive operators: there 
is a positive semi-definite F such that 

/(5) = tr(F§)=^((s.|i?s.)) , (5) 

i 

where at the second equality we have inserted Eq. 10}. 

In each iteration we define a new completely positive 
map S' by applying F to each Si, i.e., s'^ = F{si), or 

= = (6) 

i 

Clearly, S' is usually not trace preserving. Instead, we 
have ti{S'{p)) = tr(Mp), where 

i 

In order to normalize the channel we therefore multi- 
ply each s'^ with the suitable power of M : if M is non- 
singular, we set ti = s'^M^^/^, so J2i ^i^i = 1- These will 
be the Kraus operators of the next iterate 5+, i.e., the 
overall iteration step is 

S ^ S+, S+{p) = S'{Nr^/^pM-^l^) , (8) 



with S' determined by Eq. ©. In the applications be- 
low M is always invertible. But when M is singular, 
we can still take M^^/^ as the pseudo- inverse, and the 
channel 5'+ becomes normalized to a projection, i.e., it 
is trace preserving only for input density matrices on the 
support subspace of M and annihilates density operators 
supported on the complement. 

The properties of this iteration resemble those of 
the power method (which is, in fact, the special case 
dim 7^2 — !)• Most importantly, one gets an improve- 
ment of the objective functional in every step: /(>S'+) > 
f{S). The proof is based on inequality (0), for an op- 
erator A depending on the normalization correction M, 
which hence changes in every step. As for the power 
method, there may be non-maximal fixed points of the it- 
eration, corresponding to non-maximal eigenvalues of A. 
However, they are all unstable: a small random perturba- 
tion of such a fixed point is sufficient to get the iteration 
going again, finding strictly higher f{S). Therefore, test- 
ing the stability of any fixed point found is included into 
the general algorithm. 

We have proved that this "stabilized" iteration does 
converge to the global maximum, provided that the ini- 
tial number of Kraus operators is sufficiently large to 
allow representation of arbitrary channels for the given 
dimensions. In that case convexity guarantees that there 
are no sub-optimal local maxima, and the linear stability 
analysis of the iteration shows that all stable points are 
indeed local maxima. Note that our iteration without 
stabilization never increases the number of Kraus opera- 
tors, so we can also find local maxima with such a con- 
straint, e.g., the constraint that encoding uses only one 
isometry. 



APPLICATION TO QUANTUM ERROR 
CORRECTION 

As mentioned above, we will optimize the overall fi- 
delity of the corrected channel Fc{DTE) by alternately 
fixing the decoding D and optimizing the encoder E by 
the iteration method, and fixing E and optimizing D. 
Since both kinds of steps increase fidelity, this procedure 
converges to an optimum. All results reported below were 
computed by starting from various random initial confi- 
gurations. The iteration was stopped when the gain of 
fidelity was below some threshold. 



Depolarizing Channel 

The procedure is applied to the depolarizing qubit 
channel with parameter p, i.e., 

rp(p)=ptr(p)ia + (i-p)p. (9) 



4 




0.2 - 



qI . 1 . 1 J 

0.5 1 4/3 

parameter p of the depolarizing channel 

FIG. 2: Comparison of the channel fidehty of no error correc- 
tion (dotted hne) , five bit code (dashed lines) and the iteration 
(solid line) applied to the 5-fold tensor product of depolariz- 
ing channel with parameter p. For p > 1 also the fidelity for 
five-bit encoder combined with optimized decoder is shown. 



[lO|. In this range of p our method does lead to a new 
type of code. By this we mean that in contrast to Knill- 
Laflamme theory no error syndrome is corrected: there 
is no channel T' such that DT'E is a multiple of the 
identity, whereas any corrected error syndrome in the 
Knill-Laflamme theory would provide such T' . We es- 
tablish this result by our basic iteration, this time fixing 
E, D, and considering T' as the variable. On the other 
hand, by fixing E — and T, one can also check that 
it is not sufficient to just improve the decoder, and keep 
the five-bit-code encoding, as suggested by the analogous 
classical case of three bit flip channels with flip probabil- 
ity greater than 1/2. 



Random Channels 



For < p < 1 this channel totally depolarizes the input 
system with probability p and leaves the input system 
untouched with probability (1 — p). The importance of 
this channel lies in its role as the worst case (the most 
mixed channel), whenever only a lower bound on the fi- 
delity of a channel is known. The correction scheme for 
the depolarizing channel will then correct all such chan- 
nels, to at least the same fidelity, even if further details 
are unknown. However, due to its high symmetry this 
channel is rather special (see Subsect. C below). 

We will look at the fivefold tensor power of the depo- 
larizing channel, since for fewer copies of the channel the 
optimal correction strategy turns out to do no correc- 
tion at all, i. e., to copy the input to one of the output 
qubits and discard the rest. For five bits we have the 
standard five-bit stabilizer code which we denote 

by (E^jD^). Its performance, given by the polynomial 

Fc{D,T^^'E,) = l-fp' + f - fp' + , (10) 

is shown in Figure |21 Surprisingly, the optimal codes 
determined by our method fall exactly on the known 
lines: the five-bit code up to the cross-over point p — 
1 — ^2/3 « 0.18, and doing nothing for the range up to 
p = 1 . This is very surprising in view of the fact that the 
five bit code is not at all designed to give good results for 
large errors, but only to eliminate the linear term in (|1(J|I . 



Moving away from the highly symmetric channels, we 
have considered random channels generated either with 
independent uniformly distributed entries, followed by 
normalization, or as convex combinations of such chan- 
nels with the identity. In either case one sees that, gener- 
ically, the optimized codes never correct a single syn- 
drome. In contrast to the known limitations of Knill- 
Laflamme codes, even for four encoding bits one often 
gets an improvement of the fidelity. More precisely, the 
fidelity after coding tends to increase (though often not 
by much) with every additional encoding qubit. 

On the other hand, one feature of Knill-Laflamme the- 
ory is typically shared by the optimized codes: the en- 
coding E is isometric, i.e., it is given by a single Kraus 
operator. While it is known that this choice is asymp- 
totically optimal (it suffices to get the same capacity as 
general encodings Q), it is open whether it is also op- 
timal for every fixed noisy channel, as suggested by our 
random search. 
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New Codes near the Universal Not 

Note that values p > 1 in equation ^ are also ad- 
missible, as it defines a completely positive map for all 
< p < 4/3. For p > 1 the channel correspond to a 
mixing of the totally depolarizing channel and the best 
possible approximation to the "universal not" channel 
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